Goto

Collaborating Authors

 Passaic County


PeerCoPilot: A Language Model-Powered Assistant for Behavioral Health Organizations

Mo, Gao, Raman, Naveen, Chai, Megan, Peng, Cindy, Pagdon, Shannon, Jones, Nev, Shen, Hong, Swarbrick, Peggy, Fang, Fei

arXiv.org Artificial Intelligence

Behavioral health conditions, which include mental health and substance use disorders, are the leading disease burden in the United States. Peer-run behavioral health organizations (PROs) critically assist individuals facing these conditions by combining mental health services with assistance for needs such as income, employment, and housing. However, limited funds and staffing make it difficult for PROs to address all service user needs. To assist peer providers at PROs with their day-to-day tasks, we introduce PeerCoPilot, a large language model (LLM)-powered assistant that helps peer providers create wellness plans, construct step-by-step goals, and locate organizational resources to support these goals. PeerCoPilot ensures information reliability through a retrieval-augmented generation pipeline backed by a large database of over 1,300 vetted resources. We conducted human evaluations with 15 peer providers and 6 service users and found that over 90% of users supported using PeerCoPilot. Moreover, we demonstrated that PeerCoPilot provides more reliable and specific information than a baseline LLM. PeerCoPilot is now used by a group of 5-10 peer providers at CSPNJ, a large behavioral health organization serving over 10,000 service users, and we are actively expanding PeerCoPilot's use.


Robust Bayesian Optimisation with Unbounded Corruptions

Ezzerg, Abdelhamid, Bogunovic, Ilija, Knoblauch, Jeremias

arXiv.org Machine Learning

Bayesian Optimization is critically vulnerable to extreme outliers. Existing provably robust methods typically assume a bounded cumulative corruption budget, which makes them defenseless against even a single corruption of sufficient magnitude. To address this, we introduce a new adversary whose budget is only bounded in the frequency of corruptions, not in their magnitude. We then derive RCGP-UCB, an algorithm coupling the famous upper confidence bound (UCB) approach with a Robust Conjugate Gaussian Process (RCGP). We present stable and adaptive versions of RCGP-UCB, and prove that they achieve sublinear regret in the presence of up to $O(T^{1/2})$ and $O(T^{1/3})$ corruptions with possibly infinite magnitude. This robustness comes at near zero cost: without outliers, RCGP-UCB's regret bounds match those of the standard GP-UCB algorithm.


ProgressGym: Alignment with a Millennium of Moral Progress

Neural Information Processing Systems

Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.



ProgressGym: Alignment with a Millennium of Moral Progress

Neural Information Processing Systems

Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.


Ontology Creation and Management Tools: the Case of Anatomical Connectivity

Kokash, Natallia, de Bono, Bernard, Gillespie, Tom

arXiv.org Artificial Intelligence

Ontologies are essential for developing standardized vocabularies and defining relationships that help describe and interpret data from diverse sources. They are crucial for achieving semantic interoperability in many domains, allowing different systems to exchange data with a consistent and shared meaning. Ontologies are extensively used in biological and biomedical research Hoehndorf et al. (2015); Antezana et al. (2009), due to their ability to: provide standard identifiers for classes and relationships representing complex phenomena; include metadata to clarify the intended meaning of classes and relationships; include machine-readable definitions that allow computational access to class properties and relationships; standardize vocabulary across multiple data sources. Ontology-based data integration plays a vital role in neuroscience, where researchers synthesize knowledge across physiology, anatomy, molecular and developmental biology, cytology, and mathematical modeling to support accurate data representation, analysis, and simulation. A common challenge for many large neuroscience projects is the integration of data across a wide diversity of species, spatial resolutions, and temporal scales.


In silico study on the cytotoxicity against Hela cancer cells of xanthones bioactive compounds from Garcinia cowa: QSAR based on Graph Deep Learning, Network Pharmacology, and Molecular Docking

Son, Nguyen Manh, Vang, Pham Huu, Dung, Nguyen Thi, Thao, Nguyen Manh Ha. Ta Thi, Thuy, Tran Thi Thu, Giang, Phan Minh

arXiv.org Artificial Intelligence

Institute of Natural Products Chemistry, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Nighiado, Cau Giay, Hanoi, Vietnam Abstract: Cancer is recognized as a complex group of diseases, contributing to the highest global mortality rates, with increasing prevalence and a trend toward affecting younger populations. It is characterized by uncontrolled proliferation of abnormal cells, invasion of adjacent tissues, and metastasis to distant organs. Garcinia cowa, a traditional medicinal plant widely used in Southeast Asia, including Vietnam, is employed to treat fever, cough, indigestion, as a laxative, and for parasitic diseases. Numerous xanthone compounds isolated from this species exhibit a broad spectrum of biological activities, with some showing promise as anti-cancer and antimalarial agents. Network pharmacology analysis successfully identified key bioactive compounds Rubraxanthone, Garcinone D, Norcowanin, Cowanol, and Cowaxanthone--alongside their primary protein targets (TNF, CTNNB1, SRC, NFKB1, and MTOR), providing critical insights into the molecular mechanisms underlying their anti-cancer effects. The Graph Attention Network algorithm demonstrated superior predictive performance, achieving an R of 0.98 and an RMSE of 0.02 after data augmentation, highlighting its accuracy in predicting pIC50 values for xanthone-based compounds. Additionally, molecular docking revealed MTOR as a potential target for inducing cytotoxicity in HeLa cancer cells from Garcinia cowa. Keywords: Garcinia cowa, Hela, Network pharmacology, Graph neural network, Molecular docking I. Introduction Cancer is a complex group of diseases and one of the leading causes of mortality worldwide, characterized by the uncontrolled proliferation of abnormal cells, the ability to invade adjacent tissues, and metastasis to distant organs in the body [1, 2].


Sequence-based protein-protein interaction prediction and its applications in drug discovery

Charih, François, Green, James R., Biggar, Kyle K.

arXiv.org Artificial Intelligence

Aberrant protein-protein interactions (PPIs) underpin a plethora of human diseases, and disruption of these harmful interactions constitute a compelling treatment avenue. Advances in computational approaches to PPI prediction have closely followed progress in deep learning and natural language processing. In this review, we outline the state-of-the-art for sequence-based PPI prediction methods and explore their impact on target identification and drug discovery. We begin with an overview of commonly used training data sources and techniques used to curate these data to enhance the quality of the training set. Subsequently, we survey various PPI predictor types, including traditional similarity-based approaches, and deep learning-based approaches with a particular emphasis on the transformer architecture. Finally, we provide examples of PPI prediction in systems-level proteomics analyses, target identification, and design of therapeutic peptides and antibodies. We also take the opportunity to showcase the potential of PPI-aware drug discovery models in accelerating therapeutic development.


Memorization and Knowledge Injection in Gated LLMs

Pan, Xu, Hahami, Ely, Zhang, Zechen, Sompolinsky, Haim

arXiv.org Artificial Intelligence

Large Language Models (LLMs) currently struggle to sequentially add new memories and integrate new knowledge. These limitations contrast with the human ability to continuously learn from new experiences and acquire knowledge throughout life. Most existing approaches add memories either through large context windows or external memory buffers (e.g., Retrieval-Augmented Generation), and studies on knowledge injection rarely test scenarios resembling everyday life events. In this work, we introduce a continual learning framework, Memory Embedded in Gated LLMs (MEGa), which injects event memories directly into the weights of LLMs. Each memory is stored in a dedicated set of gated low-rank weights. During inference, a gating mechanism activates relevant memory weights by matching query embeddings to stored memory embeddings. This enables the model to both recall entire memories and answer related questions. On two datasets - fictional characters and Wikipedia events - MEGa outperforms baseline approaches in mitigating catastrophic forgetting. Our model draws inspiration from the complementary memory system of the human brain.


Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

Sun, Yingying, A, Jun, Liu, Zhiwei, Sun, Rui, Qian, Liujia, Payne, Samuel H., Bittremieux, Wout, Ralser, Markus, Li, Chen, Chen, Yi, Dong, Zhen, Perez-Riverol, Yasset, Khan, Asif, Sander, Chris, Aebersold, Ruedi, Vizcaíno, Juan Antonio, Krieger, Jonathan R, Yao, Jianhua, Wen, Han, Zhang, Linfeng, Zhu, Yunping, Xuan, Yue, Sun, Benjamin Boyang, Qiao, Liang, Hermjakob, Henning, Tang, Haixu, Gao, Huanhuan, Deng, Yamin, Zhong, Qing, Chang, Cheng, Bandeira, Nuno, Li, Ming, E, Weinan, Sun, Siqi, Yang, Yuedong, Omenn, Gilbert S., Zhang, Yue, Xu, Ping, Fu, Yan, Liu, Xiaowen, Overall, Christopher M., Wang, Yu, Deutsch, Eric W., Chen, Luonan, Cox, Jürgen, Demichev, Vadim, He, Fuchu, Huang, Jiaxing, Jin, Huilin, Liu, Chao, Li, Nan, Luan, Zhongzhi, Song, Jiangning, Yu, Kaicheng, Wan, Wanggen, Wang, Tai, Zhang, Kang, Zhang, Le, Bell, Peter A., Mann, Matthias, Zhang, Bing, Guo, Tiannan

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.